Fixes #634 - Implemented data entry date option for TS data retrieval #927

zack-rma · 2024-10-22T16:13:38Z

Fixes #634 - Implements data entry date as option for TimeSeries data retrieval - Serialization bug in progress

…on bug in progress

MikeNeilson · 2024-10-22T16:34:59Z

cwms-data-api/src/main/java/cwms/cda/api/TimeSeriesController.java

@@ -84,7 +83,7 @@

 public class TimeSeriesController implements CrudHandler {
    private static final Logger logger = Logger.getLogger(TimeSeriesController.class.getName());
-
+    private static final String INCLUDE_ENTRY_DATE = "include-entry-date";


From a purely pedantic standpoint this should really be a parameter of the content-type, but I may have to accept the reality of this being easier for everyone.

@krowvin, we were just discussing this conceptually.

Path parameters are the "what" I am retrieving, query parameters are the "how" am I adding to or filtering that data, and content-type is the "shape" that is returned. This change effects the how and given the way the column names and data array are paired together to already give this flexibility, I don't see a change in shape.

Adding a column is definitely a change in shape. The what is a time series, the query parameters specify exactly which time series, or at least which portion of a given time series (I guess if we're being really pedantic begin and end should be in the fragment... but I digress).

While the flexibility is there, it's flexibility to change the shape. I don't totally disagree with you but given we haven't communicated that portion of the contract very well we are introducing a breaking change. We already have more than one downstream library dependent on these types.

I'm going to type up something on the wiki, or maybe discussion, for the philosophy I'm going for with these, hopefully my argument makes more sense in regards to query vs content type. Especially as it relates to some of the challenges we're currently seeing.

Yeah, it is not conveyed very well that one should use the columns array to determine which index to grab from the data array. I don't think that adding information to the swagger docs to communicate that should trigger a content-type change though (maybe version=2.5 but that seems like a huge headache) especially since not including the parameter to retrieve entry date keeps the array intact for backwards compatibility.

Also, I've never seen an API where the content-type changes how much extra (or less) data gets returned to the client. I'd like to see some examples. I also don't see the reason (pedantic or not) for adding begin/end as path parameters as those are filters on the time series. Everything for the identifier of the time series encompasses the time series (which is also why the date version shouldn't be in the path and is a query parameter).

NOTE: not arguing for it, but explanation of my logic on the fragement.
a time series is:
/timeseries/Alder Springs.Precip-INC.Total.1Hour.1Hour.Calc?version_date=unversioned identifies a specific time series which is a mathematical series of data - and technically all of it. The entire series could be consider a "document", a fragment is a section within a document; traditionally we would think of lines in a file, but it applies to the time series as well. A file, after all, is just a series of lines.

As I said, extremely pedantic.... admittedly almost to the point of being useless because literally no one does it that way, nor would they even if it could be proven objectively correct.

Technically the units are also representation and not identification but given how limited the use of content-type features are used it would be incredibly difficult to get people to use it; I don't even think the Swagger-UI has a mechanism to slightly tweak the content-type.

But back on the topic of what's correct for us, it seems we're all in agreement that @DanielTOsborne 's initial design is already sufficiently flexible in the current scheme and our failure was in how we documented that for the general end user.

So we leave the inclusion as a query parameter unless a better way actually comes along.

MikeNeilson · 2024-10-22T16:38:02Z

cwms-data-api/src/main/java/cwms/cda/data/dto/TimeSeries.java

@@ -248,6 +256,11 @@ public static class Record {
        @JsonProperty(value = "quality-code", index = 2)
        int qualityCode;



This might be better as a sub class. I know that adds some complex but eventually we may also want to include the version date and text notes that may be attached and that would be a lot of logic in this one class.

Changed to a subclass. I added a custom deserializer to handle the different classes.

Okay, so I said this, Adam said this, but after reading the code and our other discussions above timeseries does it make more sense to just make TimeSeries more generic?

e.g.

A builder where you manually add column names, index, and type, and and functions in the row builder to set such?
something like

withColumn(int index, String name, String description, Class<T> type) { "logic" } ... Record: <T> setColumn(int index, T value, Class<T> type) { "logic" }

Or something like that, it would prevent the need in TimeSeriesDaoImpl have two different loops doing almost 90% the same work. just a check for "I have this column requested, let's also add it."

Basically instead of hard coding the columns at all (okay, maybe time... is a time series) , the user of the given TimeSeries object (after set by builder) can define them at run-time.

Sorry, I know you did a lot here, that just came to me now looking through the current PR.

…dle custom output, adds data entry date support

MikeNeilson · 2024-11-04T19:51:04Z

@jbkolze what are your thoughts on this? At least how it's done. Some appears it may be a breaking change which we want to avoid.

MikeNeilson · 2024-11-04T19:55:20Z

cwms-data-api/src/main/java/cwms/cda/data/dao/LocationLevelsDaoImpl.java

@@ -623,7 +622,7 @@ private static TimeSeries buildTimeSeries(ILocationLevelRef levelRef, Interval i
            if (qualityCode != null) {
                quality = qualityCode.intValue();
            }
-            timeSeries.addValue(dateTime, value, quality);
+            timeSeries.addValue(dateTime, value, quality, null);


Are we sure this isn't a breaking change?

I'll double check with additional test cases, but this change should not break any existing functionality. A null data entry date parameter is treated as if a standard Time-Value-Quality data entry was provided. The implementation of addValue will use a TimeSeries data record with only three input fields under normal circumstances:

if (dataEntryDate != null) { values.add(new TimeSeriesRecordWithDate(dateTime, value, qualityCode, dataEntryDate)); } else { values.add(new Record(dateTime, value, qualityCode)); }

The existing use cases of TimeSeries should be unaffected, as they will be handled exactly as they were before these changes.

I recommend subclassing TimeSeries instead

I think it would break CWMS.js, The javascript openapi generator already has trouble with our TimeSeries class given some specific assumptions the generator chose to make.

jbkolze · 2024-11-04T20:28:47Z

@jbkolze what are your thoughts on this? At least how it's done. Some appears it may be a breaking change which we want to avoid.

Conceptually, I don't personally have any qualms with it. You had mentioned in the typing discussion that you all were trying to leave flexibility for dynamically adjusting the time series values array, and this seems like a prime use case for that.

That being said, I am a little concerned about the part you marked as a breaking change. I don't know the CDA source that well, but is this indicating that the response would include a fourth null value even if the include_entry_date is false? Because that definitely would not be ideal -- we've written a lot of CDA code already (and I think other districts have as well) that would have to be updated. Not as big of a deal if an API version were included in the path (.../v3/...), but somewhat cumbersome in the current setup.

My understanding from previous conversations is that you'd get the normal 3-value array if this were set to false, but receive a 4-value array if include_entry_date is true. And the "value-columns" object would be updated to match. That'd be my "ideal" implementation.

zack-rma · 2024-11-04T21:47:19Z

You are correct that setting the include_entry_date parameter to true would result in a four-value array, whereas setting it to false would return a three-value array. Your "ideal" implementation is what I was aiming for to retain backwards compatibility and avoid breaking any of the other endpoints that rely on the TimeSeries implementation.

adamkorynta · 2024-11-04T23:44:16Z

cwms-data-api/src/main/java/cwms/cda/data/dao/TimeSeriesDaoImpl.java

-                            tsRecord.getValue(qualityNormCol).intValue()
-                    )
-            );
+            if (includeEntryDate) {


This query is doubling the time it takes to retrieve time series. Can this replace the retrieve_ts_out_tab calls?

While it could replace the retrieve_ts_out_tab call above, doing so would require implementing trim support into the query, as that is currently handled by the retrieve_ts_out_tab call. I haven't quite figured out the best way to do so, so maybe we can discuss this in more detail

adamkorynta · 2024-11-04T23:45:06Z

cwms-data-api/src/main/java/cwms/cda/data/dto/TimeSeries.java

@@ -159,7 +164,8 @@ public ZonedDateTime getEnd() {
    }

    // Use the array shape to optimize data transfer to client
-    @JsonFormat(shape=JsonFormat.Shape.ARRAY)
+    @JsonFormat(shape = JsonFormat.Shape.ARRAY)
+    @JsonDeserialize(contentUsing = TimeSeriesRecordDeserializer.class)


This method is overwritten for XML using a Mixin, did you verify that behavior works as intended still?

I added a Mixin test that verifies that the XML tags for the value records have the appropriate labels. There are also a couple serialization/deserialization tests that verify that this works as intended.

adamkorynta · 2024-11-04T23:46:00Z

cwms-data-api/src/main/java/cwms/cda/data/dto/TimeSeries.java


    @JsonProperty(value = "value-columns")
    @Schema(name = "value-columns", accessMode = AccessMode.READ_ONLY)
    public List<Column> getValueColumnsJSON() {
-        return getColumnDescriptor();
+        return getColumnDescriptor((values != null && !values.isEmpty())


This seems like behavior for a subclass

Added to TimeSeries subclass

adamkorynta · 2024-11-04T23:46:53Z

cwms-data-api/src/main/java/cwms/cda/data/dto/TimeSeries.java

@@ -218,7 +228,16 @@ private List<Column> getColumnDescriptor() {
                columns.add(new TimeSeries.Column(fieldName, fieldIndex + 1, f.getType()));
            }
        }
-
+        if (includeDataEntryDate) {


This could also be accomplished better with a subclass

Moved into subclass

adamkorynta · 2024-11-04T23:49:21Z

cwms-data-api/src/main/java/cwms/cda/data/dto/TimeSeries.java

+
+    // This class is used to deserialize the time-series data JSON into an object
+    // Solves the issue of the deserializer getting stuck in a loop
+    // and throwing a StackOverflowError when trying to handle the Record class directly


This seems sketchy to me, why is your custom deserializer causing this?

Removed custom deserializer

adamkorynta · 2024-11-04T23:53:10Z

cwms-data-api/src/main/java/cwms/cda/formatters/json/adapters/TimeSeriesRecordDeserializer.java

+            return jsonParser.getCodec().treeToValue(node, TimeSeriesRecordWithDate.class);
+        }
+        String nodeString = node.toString();
+        if (nodeString.startsWith("[")) {


A mixin doesn't solve the need for this custom parsing? All this logic looks like we're circumventing jackson too much

Removed custom serializer

adamkorynta · 2024-11-04T23:54:46Z

cwms-data-api/src/main/java/cwms/cda/formatters/json/adapters/TimeSeriesRecordDeserializer.java

+                Timestamp dateTime = Timestamp.from(Instant.ofEpochMilli(Long.parseLong(valList[0])));
+                double value = Double.parseDouble(valList[1]);
+                int quality = Integer.parseInt(valList[2]);
+                Timestamp entryDate = Timestamp.from(Instant.ofEpochMilli(Long.parseLong(valList[3])));


no need to convert from Instant to Timestamp, Timestamp's constructor takes in epoch millis.

adamkorynta · 2024-11-04T23:55:14Z

cwms-data-api/src/main/java/cwms/cda/formatters/json/adapters/TimeSeriesRecordDeserializer.java

+    @Override
+    public TimeSeries.Record deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException {
+        JsonNode node = jsonParser.readValueAsTree();
+        if (node.get("data-entry-date") != null) {


This should be a constant

should probably also be ignored, data-entry-date is always set by the database itself, an external system/user isn't allowed to change it.

adamkorynta · 2024-11-04T23:56:46Z

cwms-data-api/src/main/java/cwms/cda/data/dao/LocationLevelsDaoImpl.java

@@ -623,7 +622,7 @@ private static TimeSeries buildTimeSeries(ILocationLevelRef levelRef, Interval i
            if (qualityCode != null) {
                quality = qualityCode.intValue();
            }
-            timeSeries.addValue(dateTime, value, quality);
+            timeSeries.addValue(dateTime, value, quality, null);


I recommend subclassing TimeSeries instead

MikeNeilson · 2024-11-05T14:43:44Z

My understanding from previous conversations is that you'd get the normal 3-value array if this were set to false, but receive a 4-value array if include_entry_date is true. And the "value-columns" object would be updated to match. That'd be my "ideal" implementation.

That is correct for the location indicated, which is location levels backed by a time series. I don't know how much it would affect what you're currently using, but it's definitely not ideal. At the least it definitely shouldn't be null, may as well provide the data, but seems like a parameter should be added to match on the location level end point.

But it sounds like your on the same page with Adam about the shape being already explicitly flexible so I'm okay with that section now; not "ideal", but what is, definitely something to better document though.

MikeNeilson · 2024-11-05T23:32:48Z

cwms-data-api/src/main/java/cwms/cda/data/dao/TimeSeriesDaoImpl.java

-                });
+                logger.fine(() -> query2.getSQL(ParamType.INLINED));
+                final TimeSeriesWithDate timeSeries = new TimeSeriesWithDate(timeseries);
+                query2.forEach(tsRecord -> timeSeries.addValue(


we don't need to solve this now, but definitely before we add requesting any text notes as well. There has to be a better way to handle this with the builders.

MikeNeilson · 2024-11-05T23:34:10Z

cwms-data-api/src/main/java/cwms/cda/data/dao/TimeSeriesDaoImpl.java

        if (pageSize != 0) {
+            if (versionDate != null) {
+                whereCond = whereCond.and(AV_TSV_DQU.AV_TSV_DQU.VERSION_DATE.eq(versionDate == null ? null :


Does this logic handle max version? or will AV_TSV_DQU always return every version? or only the specifically requested version?

MikeNeilson · 2024-11-05T23:43:01Z

cwms-data-api/src/main/java/cwms/cda/data/dto/TimeSeries.java

@@ -248,6 +256,11 @@ public static class Record {
        @JsonProperty(value = "quality-code", index = 2)
        int qualityCode;



Okay, so I said this, Adam said this, but after reading the code and our other discussions above timeseries does it make more sense to just make TimeSeries more generic?

e.g.

A builder where you manually add column names, index, and type, and and functions in the row builder to set such?
something like

withColumn(int index, String name, String description, Class<T> type) { "logic" } ... Record: <T> setColumn(int index, T value, Class<T> type) { "logic" }

Or something like that, it would prevent the need in TimeSeriesDaoImpl have two different loops doing almost 90% the same work. just a check for "I have this column requested, let's also add it."

Basically instead of hard coding the columns at all (okay, maybe time... is a time series) , the user of the given TimeSeries object (after set by builder) can define them at run-time.

Sorry, I know you did a lot here, that just came to me now looking through the current PR.

Implemented data entry data option for TS data retrieval, serializati…

95361d7

…on bug in progress

MikeNeilson reviewed Oct 22, 2024

View reviewed changes

Created subclass of TimeSeries Record with custom deserializer to han…

e04b7e4

…dle custom output, adds data entry date support

zack-rma marked this pull request as ready for review October 25, 2024 18:53

zack-rma requested a review from adamkorynta October 25, 2024 21:04

zack-rma changed the title ~~Fixes #634 - Implemented data entry data option for TS data retrieval~~ Fixes #634 - Implemented data entry date option for TS data retrieval Oct 31, 2024

zack-rma added 2 commits November 4, 2024 11:33

Merge branch 'develop' into bugfix/ts_data_entry_issue_634

6b9a89d

Fixes build error

b91145c

MikeNeilson reviewed Nov 4, 2024

View reviewed changes

Fixes test case failure

20615da

adamkorynta requested changes Nov 4, 2024

View reviewed changes

zack-rma added 2 commits November 5, 2024 14:52

634 TimeSeries Subclass update

1d0d389

634 TimeSeries Subclass update - added Mixin test

667248d

MikeNeilson reviewed Nov 5, 2024

View reviewed changes

		@@ -248,6 +256,11 @@ public static class Record {
		@JsonProperty(value = "quality-code", index = 2)
		int qualityCode;

Fixes #634 - Implemented data entry date option for TS data retrieval #927

Are you sure you want to change the base?

Fixes #634 - Implemented data entry date option for TS data retrieval #927

Conversation

zack-rma commented Oct 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeNeilson commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbkolze commented Nov 4, 2024

zack-rma commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeNeilson commented Nov 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment